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IV 


1.  Introduction 


Many  problems  in  materials  design  involve  optimizing  semi-crystalline  polymeric 
materials  with  respect  to  their  dynamical  behavior  or  nonequilibrium  properties. 
These  problems  pose  substantial  modeling  challenges  that  do  not  afflict  crystalline 
or  amorphous  materials  or  raise  questions  about  thermodynamics.  First,  length  scales 
on  the  order  of  10  qm  are  required  to  model  many  semi-crystalline  materials  or 
rare-event  phenomena  involving  low-  to  medium-defect  concentrations  (e.g.,  ag¬ 
gregation,  crack  initiation,  and  strain-induced  crystallization).  This  scale  exceeds 
current  computational  resources  for  molecular  dynamics  (MD)  simulations  by  a 
factor  of  1,000.  Ultra-high-molecular- weight  polyethylene  offers  a  classic  example 
of  the  scale  challenge:  despite  its  simple  chemical  makeup,  CnH2n+2  with  n 
100,000,  as  a  material  it  exhibits  large  regions  of  disorder  that  coexist  with  crys¬ 
talline  regions. 

Compounding  the  challenge,  fully  atomic  detail  is  required  for  performance-deter¬ 
mining  phenomena,  such  as  vacancies,  dislocations,  and  crack  tips;  consequently, 
MD  simulations  represent  the  state  of  the  art  for  computational  design  of  materials 
from  first  principles.  In  contrast,  for  crystalline  materials  one  may  use  multiscale 
models  that  couple  atomistic  and  continuum  theories,  e.g.,  the  quasi-continuum 
model  and  the  finite-element  atomistic  method. 

We  now  highlight  the  main  computational  challenges  faced  by  MD  methods.  First, 
high-frequency  events,  such  as  bond  vibrations,  dominate  the  state  space  sampled 
during  an  MD  simulation  (and  the  computational  work  performed  in  the  process). 
Slower  processes,  such  as  torsions  and  translations,  are  usually  the  more  relevant 
ones.  The  polymers’  long  chain  lengths,  and  low  defect  concentrations,  compound 
these  effects  by  requiring  a  large  number  of  degrees  of  freedom  (DoFs).  Similarly, 
the  size  of  the  crystalline  or  disordered  regions  is  frequently  too  large  for  fully 
atomistic  MD  simulations,  and  hence  MD  is  generally  restricted  to  homogeneous 
systems,  i.e.,  fully  crystalline  or  wholly  amorphous  materials.  A  polymer  melt  with 
N  particles  requires  0(N3)  time  to  equilibrate,  making  simulation  essentially  im¬ 
possible  without  advanced  simulation  approaches. 

Coarse-graining  is  an  effective  approach  to  reduce  simulation  cost.  Coarse-grained 
(CG)  simulations  are  usually  faster  than  their  atomistic  counterparts  due  partially  to 
the  reduced  DoFs  and  partially  to  an  increased  time  step  enabled  by  the  elimination 


1 


of  high-frequency  vibrations.  CG  methods  generally  reduce  predefined  chemical 
groups,  or  sets  of  atoms,  to  a  single  “bead,”  and  the  assignment  of  atoms  and  the 
bead  properties  are  developed  through  chemical  intuition  and  careful  testing.  The 
force-matching  method  of  Lu  et  al.1  and  the  reversible  CG  method  for  phenolic 


polymers  by  Harmandaris  et  al.2  have  greatly  impacted  accessible  simulation  scales 


for  these  materials.  Other  groups  have  used  wavelet  bases  to  construct  potentials 
that  are  efficiently  evaluated.3  However,  the  human  insights  required  at  the  atom¬ 
istic  level  makes  it  difficult  to  develop  higher-level  CG  models,  which  are  essential 
to  reach  the  desired  10- pm  length  scale  and  commensurate  time  scales.  Further¬ 
more,  these  CG  approaches  introduces  the  difficult  problem  of  generating  consistent 
atomistic  reconstructions,  because  particle-like  coarse-grained  beads  lose  informa¬ 
tion  about  the  particles  they  subsume.  Few  models  provide  the  on-the-fly  adaptivity 
required  for  important  problems,  such  as  modeling  crack  initiation  and  propagation, 
or  interfacial  phenomena. 

In  the  present  work,  we  introduce  a  wavelet-based  approach  to  extend  the  work 
of  Ismail  et  al.,4,5  which  provides  a  consistent  and  systematic  framework  to  derive 
multiple  levels  of  model  resolution  while  also  reducing  simulation  complexity.  Im¬ 
portantly  for  the  dynamical  and  nonequilibrium  metrics  of  interest,  this  approach 
captures  molecular  information  relevant  to  kinetics  in  addition  to  thermodynam¬ 
ics.  Our  approach  is  tied  strictly  to  the  underlying  physics  and  has  the  potential 
to  increase  accessible  simulation  sizes  and  durations  by  multiple  orders  of  magni¬ 
tude.  Earlier  wavelet  approaches  for  MD  addressed  time-series  analysis,  a  classical 
application  of  wavelet  techniques,  and  did  not  explore  model  acceleration  and  ap¬ 
proximation. 

2.  Methodology 

The  foundations  of  MD  lie  in  the  application  of  Newtonian  mechanics  to  the  energy 
functional 


(1) 


where  x  are  particle  positions,  M  is  the  diagonal  matrix  of  particle  masses,  and  V 
is  the  applied  potential.  For  the  macromolecular  systems  we  are  interested  in,  V  is 
usually  partitioned  into 
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V bond  is  generally  a  pairwise  harmonic  potential: 

Vhond(x)  =  lkij(\\xi  ~  Xj\\ -  rkj))2. 

i¥=j 

The  matrix  K  =  is  highly  sparse.  The  atoms  and  K  define  a  graph 

in  which  the  atoms  are  the  vertices  and  an  edge  between  atom  i  and  j  has 
weight  from  Kl} .  The  maximum  degree  of  a  vertex  in  organic  materials  is  4, 
and  even  in  organometallic  complexes  the  coordination  number  is  unlikely  to 
exceed  6. 

Vangie  is  a  3-body  potential.  Several  functional  forms  are  common  for  angle  poten¬ 
tials  complicating  not  only  implementation  but  also  choice  of  approximation. 
In  all  cases,  the  contributions  vary  much  more  slowly  than  in  bonds,  and  the 
energies  from  angle  interactions  are  typically  an  order  of  magnitude  smaller 
than  bond  energies. 

V torsion  is  a  4-body  potential  that  includes  the  dihedral  and  improper  plane  defor¬ 
mation  interactions.  Force  constants  tend  to  be  2  to  3  magnitudes  lower  than 
bond  constants. 

Vnon-bonded  includes  all  nonbonded  interactions,  such  as  electrostatics  and  van 
der  Waals  terms. 

The  matrix  K  is  highly  sparse.  The  atoms  and  K  define  a  graph  in  which  the  atoms 
are  the  vertices  and  an  edge  between  atom  i  and  j  has  weight  from  Kiy  The  maxi¬ 
mum  degree  of  a  vertex  in  organic  materials  is  4,  and  even  in  organometallic  com¬ 
plexes,  the  coordination  number  is  unlikely  to  exceed  6.  From  Eq.  1,  we  derive 
equations  of  motion, 

M  5f  —  Mx  —  —W  (. x )  =  —M^AM^x  —  M^V{M  ?x),  (2) 

where  A  :=  A/A  (diag(/f  1)  —  K)M~  2 ,  1  is  the  all  ones  vector,  and  diag  maps  a 
vector  to  its  corresponding  diagonal  matrix.  By  letting  r  —  M 2  x,  the  kinetic  energy 
reduces  to  |]  |r|  |2  and  the  equations  of  motion  become 

r  =  —A  r  —  V  (r). 
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If  V  =  0,  the  system  can  be  solved  analytically, 

( 0 

/r(i)\  J-A  oj7ro\ 

\f(t)J  y°J 

Since  A  is  positive  semi-definite,  the  eigenvectors  f/t  of  A  oscillate  with  frequen¬ 
cies  ±u>i,  where  c of  is  the  corresponding  eigenvalue  and  the  sign  corresponds  to 
the  phase  of  fJt.  A  and  its  eigenvector  matrix  U  are  natural  candidates  for  defining 
wavelets,  replacing  the  Laplace  operator  and  the  Fourier  transform  in  conventional 
wavelet  theory.  If  V  is  small  compared  to  the  harmonic  contribution,  this  solution 
is  amenable  to  perturbation  theories,  and  interactions  between  harmonic  modes  of 
strongly  differing  frequencies  will  be  low.  Furthermore,  in  regular  molcular  dynam¬ 
ics,  the  maximum  time  step  for  all  DoFs  is  limited  by  the  Nyquist  sampling  rate  of 
the  highest  frequency.  If  V  is  not  small,  then  another  reference  needs  to  be  used.  In 
the  following,  it  is  assumed  that  V  is  either  small  or  can  be  properly  localized. 

2.1  Basic  Multiresolution 

Here  we  introduce  the  employed  wavelet  transform  and  its  derivation.  We  use  the 
multiresolution  analysis  for  diffusion  wavelets  as  introduced  by  Coifman  and  Mag- 
gioni6  In  essence,  the  multiresolution  decomposition  partitions  the  eigenvalues  and 
eigenvectors  of  A,  effectively  coupling  high  frequencies  in  the  time  domain  strongly 
to  high-frequency  eigenvectors  of  A  in  the  “particle”  domain.  This  is  a  very  impor¬ 
tant  point  for  the  feasibility  of  the  approach  because  not  only  can  DoFs  be  reduced, 
but  simultaneously  the  time  step  may  be  increased. 

The  multiresolution  scheme  relies  on  the  filter  T  and  an  accuracy  operator  P£  that 
projects  eigenvectors  of  a  matrix  A"  e  span{T2"|n  e  N}  with  associated  eigen¬ 
value  less  than  e  >  0  to  zero.  The  recursively  defined  vector  spaces 

I4+i=  P£(T2n+1)Vn  and  (3) 

Wn+1  =  kerPe(T2"+1|yJ  (4) 

are  iteratively  associated  with  orthonormal  bases  via  QR-decompositions, 

7~>2n  A)  7D 

-*■  ^ccnr^ri’) 
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where  Qn  is  unitary,  Rn  is  upper  triangular,  and  both  are  dependent  on  the  basis 
used  to  express  T2'  .  At  each  iteration,  the  operator  is  cast  in  the  Q  basis  of  the 
previous  iteration, 


(n^T-1^)  =  Qn+lRn+l- 
i=0  i=0 

Therefore,  repeated  application  (in  infinite  precision)  is  equivalent  to  the  QR  algo¬ 
rithm  for  finding  eigenvalues.  We  separate  Qn  into  fI>„  and  T„,  where  the  latter  are 
the  columns  of  Qn  for  which  the  corresponding  rows  of  Rn  have  a  norm  below  s. 
<f>n  collects  the  remaining  columns  of  Qn.  Thus, 


n—  1 


n— 1 


<n<wTT-(n<?.) 


ilQARlQl® 


i= 0  i=0 


Since  T  is  positive  definite  and  WT]]^  =  1,  the  squaring  introduces  a  de  facto 
projection  operator  P£  via  the  machine  precision.  It  is  this  projection  that  makes 
the  difference  between  the  QR  algorithm  and  the  wavelet  decomposition.  Wn  is 
spanned  by  eigenvectors  of  T  whose  eigenvalues  obey 


£.1/2"— x 


<  A  <  e1'2'1 


(5) 


Since  the  low-pass  filter  T  =  I  —  A/ C,  where  C  is  a  sufficiently  large  constant, 
the  frequency  range  AA  =  £l/l'2"  —  s1/2”  1  =  A  A"  (f  —  R/2"  j  f0r  is  drawn  ever 
tighter  with  each  iteration,  and  A  —  1  — >  0.  C  =  maxx  |  j  A 1 1  ^  would  generally  be 
optimal  computationally  as  'Ti  R  0.  One  advantage  of  this  approach  is  the  inherent 
permutational  invariance  of  the  wavelet  spaces  and  its  ability  to  deal  with  arbitrary 
matrices  T  with  WTW^  <  1. 

C  =  maxx  1 1 A |  loo  would  generally  be  optimal  computationally  as  \Pi  R  0.  Eigen¬ 
values  for  graph  Laplacians  are  known  to  lie  in  |0.  2  max,:  Attj.7  We  have  inves¬ 
tigated  a  variety  of  estimates,  but  so  far  the  conservative  estimate  of  the  largest 
eigenvalue  Smax  of  A  by  min{2  max*  A&,  1 1  A|  |}  has  served  best. 

Assuming  that  eigenvalues  are  distributed  approximately  quadratically,  £  =  \  would 
lead  to  log2  N  scales,  where  N  is  the  dimensionality  of  A.  This  has  not  served  well 

as  too  many  DoFs  are  lumped  together  due  to  issues  discussed  below.  Instead,  a 

1 

higher  resolution  of  £  Machine  useA  and  although  this  wastes  some  computation  on 
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the  first  few  iterations,  it  is  equivalent  to  choosing  a  tolerance  on  the  scale  of  5max, 

_  2-m  ~  1  A 

£  effective  —  £ machine  ~  -t  —  0max. 

All  MD  simulations  were  performed  in  LAMMPS.8  A  crystalline  model  of  polyethylene 
(PE)  consisting  of  100,800  atoms  was  simulated  with  1-fs  time  steps  and  PCFF9 
from  100  K  to  500  K  in  50-K  increments.  A  201,936  atom  model  of  amorphous 
PE  was  constructed  from  a  cubic  supercell  in  Materials  Studio.  We  used  FAMMPS 
with  3-fs  time  steps  and  OPES1011  for  alanine  dipeptide. 

3.  Results 

We  looked  at  other  avenues  to  derive  wavelets  from  A.  One  choice  is  to  define  a 
shift  operator  a  in  the  eigenbasis  of  A  ordered  by  eigenvalue,  i.e.,  an  eigenvector 
Ui  with  eigenvalue  A*  is  mapped  to  an  eigenvector  of  A,+  i ,  and  a  dilation  operator 
D  that  projects  every  other  eigenvector  to  zero.  We  found  no  way  to  derive  these 
without  a  full  computation  of  the  eigenproblem.  Similarly,  attempts  at  defining  a 
fast  transform  via  the  filter  algebra1213  generated  by  T  failed  because  it  required  a 
full  solution  to  the  eigenproblem. 

Since  A  is  used  to  define  the  wavelets,  we  discuss  its  properties  in  greater  detail. 
These  properties  have  a  major  impact  on  the  performance  of  the  wavelet  transform. 
Degeneracies  in  A  reduce  the  effectiveness  of  the  wavelets  based  on  A.  More  in¬ 
formation  from  the  potential  V  needs  to  be  incorporated  to  take  further  advantage. 
Degeneracies  are  rampant  in  large  linear  homopolymers,  but  their  repetitive  struc¬ 
ture  makes  for  particularly  simple  solutions. 

T  and  A  share  the  same  eigenvectors,  albeit  with  reversed  order  of  eigenvalues, 
so  we  restrict  our  discussion  to  A.  A  is  positive  semi-definite,  and  for  each  set  of 
indices  J,  A =  0,  where  A jj  is  the  square  submatrix  of  A  with  indices  in 
J,  and  1  j  is  the  vector  of  ones  on  indices  in  J  and  0  otherwise.  It  is  possible  to  block 
triagonalize  A  using  transpositions  only  with  diagonal  blocks  A  j  j  and  off-diagonal 
blocks  A  jtK,  where  J  and  K  are  disjoint  and  J,  K.  J  U  K  are  contiguous  index  sets. 
Without  loss  of  generality,  let  j  <  k\/j  G  ./.  k  G  K.  rank  A  JK  is  generally  low 
because  of  the  low  maximum  degree  of  a  vertex  in  A.  If  A  =  span  A^j  and  A  jj 
has  a  nontrivial,  invariant  vector  space  T  perpendicular  to  A,  then  E  is  localized 
to  indices  preceding  K .  Examples  include  linear  homopolymers  discussed  in  detail 
below,  but  also  disconnected  graphs  from  individual  molecules. 
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Linear  homopolymers  can  be  ordered  to  have  block  tridiagonal  structure  where  each 
nonterminal  block  is  a  constant  m  x  m- matrix  for  the  off-diagonal  B  and  diagonal 
A,  respectively, 

A  B* 

B  •••  ••• 

'  • .  '  • .  B* 

B  A 

Furthermore,  the  off-diagonal  block  B  consists  of  a  single  nonzero  entry.  The  recur¬ 
sive  structure  implies  that  A  for  a  linear  homopolymer  of  n  repeat  units  can  be  re¬ 
ordered  by  a  permutation  re  of  indices  such  that  kt Are  =  A®/n+5®£n+£?TCg)£^, 
where  In  is  the  n  x  n  identity  matrix,  and  £n  is  the  n  x  n-matrix  with  all  ones  on 
the  first  subdiagonal  only  and  zeros  elsewhere.  We  assume  that  the  single  nonzero 
entry  in  B  is  lf\,  connecting  only  the  first  entries  of  the  blocks  on  the  diagonal.  In 
this  case,  A  takes  the  simple  form 


yAm\In 


Al^In  '  ' 

^4 1,777.-^71 

A-zgl-n  ‘  ' 

^4l,m4i 

-A  T 

-'-*-771,771 71 

If  U  is  a  unitary  matrix  of  eigenvectors  of  the  tridiagonal,  symmetric  Toeplitz  matrix 
C  with  eigenvalues  =  Altl  +  2| 771;1 1  cos(m / (m  +  1)),  the  block  diagonal  unitary 
matrix  Im  8)  U  transforms  A  into  a  square  block  matrix,  for  which  every  block 
is  square  and  diagonal.  The  eigensystem  of  A  then  comprises  eigensolutions  to  the 
matrices  A  +  ( rji  —  A\A)e\e[ .  It  follows  that  any  eigenvector  v  of  A  such  that  Vi  =  0 
has  an  n-fold  degenerate  eigenvalue. 


A  often  has  many  highly  degenerate  and  localized  eigenvalues.  This  fact  constitutes 
a  problem  since  A  provides  no  further  insight  into  how  to  subdivide  these  subspaces 
within  the  given  wavelet  scheme  and  the  external  potential  V  is  not  small.  Also, 
numerical  solutions  are  poorly  defined  as  any  unitary  transform  of  the  subspace  is 
an  eigensolution  to  A. 


Example.  Any  CH2  group  has  an  associated  medium- frequency,  highly  localized 
eigenvector  of  A. 
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This  follows  from  the  fact  that  the  hydrogens  are  leafs  on  the  graph,  i.e., 


(  2  A  ch 

-Kch 

-Kch\ 

A  ch2  = 

-KCh 

KCh 

0 

\-Kch 

0 

Kch  ) 

If  the  hydrogens  are  identified  with  indices  i  and  j,  then  ^  (e* — ej)  is  an  eigenvector 
with  an  associated  frequency  of  a  CH  vibration. 

To  address  the  degeneracy  problem,  we  have  considered  using  alternate  relation¬ 
ships.  One  approach  is  to  compute  the  conjugacy  class  of  T  and  to  choose  a  phys¬ 
ically  relevant  matrix  to  lift  the  degeneracy.  Some  symmetries  have  known  algo¬ 
rithms  with  decent  performance,  such  as  searching  for  disconnected  subgraphs.  But 
in  general,  it  is  prohibitively  costly  to  find  arbitrary  symmetries  and  then  to  choose 
a  relevant  one,  since  there  are  n\  possible  permutations  of  n  indices  alone. 

Since  the  QR  decomposition  is  sensitive  to  the  permutations  of  the  matrix,  differ¬ 
ent  bases  TQ  of  Wn  are  found  depending  on  preconditioning.  This  is  of  particular 
importance  for  the  high-frequency  modes  and  highly  degenerate  frequencies.  So, 
another  route  to  approaching  the  degeneracy  issue  was  to  use  preconditioning  of 
T  to  bias  toward  localized  solutions  in  the  original  basis,  e.g.,  balancing  distance 
of  nonzeros  to  the  diagonal  or  backloading  high-degree  vertices  in  the  QR  scheme. 
This  lead  to  mixed  results  and  was  deemed  no  more  reliable  than  the  standard  de¬ 
composition.  Efforts  using  V  to  address  the  basis  problem  within  are  ongoing. 

3.1  Adaptive  Multiresolution 

Since  one  goal  of  the  project  is  to  retain  as  little  information  as  necessary,  yet  to 
reintroduce  information  when  it  is  needed  on  the  fly,  the  following  discusses  the 
technical  and  fundamental  issues  of  reconstruction.  Reconstruction  is  the  process 
of  adaptively  reintroducing  detail  after  dropping  information  in  coarse-graining.  We 
find  that  reconstruction  is  systematically  possible  for  numerical  as  well  as  analytical 
coarse-graining  hierarchies. 

3.1.1  Reconstruction  Theory 

In  order  to  discuss  reconstruction  properly,  it  is  first  necessary  to  put  coarse-graining 
into  a  wider  context.  In  general,  a  coarsening  7  :  a  — >  (3  is  a  continuous  sur¬ 
jection  between  2  topological  state  spaces  a  and  (3  that  can  be  parameterized  by 
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n  >  m  state  variables,  respectively.  If  a  is  dressed  with  a  probability  measure 
PQ  :  {X  (Z  ct}  — >  [0, 1]  to  produce  a  probability  space,  then  7  induces  a  proba¬ 
bility  space  on  (3  with  probability  measure  Pp{k)  =  Pa(7_1(/c))  where  k  (1  3  and 
7-1(/c)  C  a  is  the  preimage  of  k.  It  is  thus  possible  to  select  (reconstruct)  a  precur¬ 
sor  for  a  state  b  e  (3  by  sampling  7_1({&})  with  Pa  via  the  conditional  probability 
P(a\b)  =Pa({a}nT\b))/Pa^~\b)). 

In  MD,  the  state  spaces  consist  of  the  positions  and  their  associated  momenta  and 
thermodynamic  state  variables,  such  as  temperature  or  pressure.  The  probability 
distributions  are  Boltzmann  distributions  that  depend  on  the  studied  thermodynami¬ 
cal  ensemble.  In  a  sequence  of  coarsenings  (7 n,  f3n  ),  it  is  generally  not  cost-effective 
to  sample  in  the  largest  space  a,  and  analytical  derivations  for  Pgn  are  rarely  avail¬ 
able.  In  such  cases,  approximations  need  to  be  made.  Common  solutions  in  the  MD 
community  are  probability  measures  from  iterative  Boltzmann  inversion  or  (succes¬ 
sive)  force  matching.  Hierarchical  iteration  thereby  produces  not  only  probability 
distributions  on  the  coarser  space,  but  also  conditional  probability  distributions  for 
a  fine  state  mapping  to  a  coarse  state.  Furthermore,  the  probability  distributions  can 
be  used  to  indicate  when  a  previously  undersampled  coarse  state  subspace  is  en¬ 
countered,  e.g.,  using  an  expected  improvement  measure  of  the  potential  —  lnP^ 
based  on  the  sampled  points,  for  which  the  trust  boundaries  can  be  precomputed. 

We  now  consider  hierarchical  coarsenings.  Let  f3n  '■  Rm"  - >  Pn  denote  a  pa¬ 
rameterization  of  pn.  A  coarsening  is  separable  if  there  exist  parameterizations 
Pn  (ti'\ •  •  • ,  trnh ^  and  PTn+1  (t!n+1), . . . ,  fmS)  of  pn  and  pn+1,  respectively,  such 
that  there  exists  a  partition  of  indices  (£™)i=i...0m  €  R“m,  {t™)i=a+ i...m  € 
collected  as  Am  and  Bm,  and  continuous  surjective  mappings  :  An  — >  An+ 1 
and  fiBn  :  Bn  —>  Bn+1  with 

PniA^A),  AbIPb))  =  %\p^+1{tA,tB)) 

for  t,\  €  An+ 1,  tB  G  Bn+ 1  ■  We  will  call  a  separable  coarsening  for  which  neither 
HAn  nor  Hb„  are  bijections  a  “fundamental  coarsening.”  Fundamental  coarsenings 
induce  intermediate  coarsenings.  The  state  space  An  x  Bn+ 1  is  an  intermediate  state 
space  with  the  coarsenings  7  :  Pl(tAnpBn)  ^  (tAn,  iiBn(tBn))  e  An  x  Bn+1  and 
i  '■  (tAn,tBn+ 1)  ^  Pn+i(VAn(tAn),tBn+1)  e  pn+i. 

A  sequence  of  separable  coarsenings  thereby  induces  a  hierarchy  of  coarsenings 
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and  associated  probability  measures,  induced  as  described  above.  In  particular,  the 
reconstruction  from  a  fundamental  coarsening  can  be  achieved  by  reconstructing  its 
separable  components  separately  and  independently  via  the  conditional  probabili¬ 
ties 


P(tAn  I  tAn+1  ,  tBn+1 )  =  Pfin  (  ({tAn  }  nfiAl  (tAn+l  )  )  X  UbI  (tBn+1 ) ) /Ppn+1  (tAn+1 ,  tBn+1  )  . 

It  is  noteworthy  that  these  intermediate  probability  distributions  are  available  in  an¬ 
alytical  as  well  as  numerical  settings,  since  a  fundamental  coarsening  has  to  include 
proper  statistics  for  the  intermediate  state  space  in  order  to  be  consistently  sampled. 
Recursive  application  of  conditional  probabilities  enables  concurrent  mixed  resolu¬ 
tions.  Since  the  construction  of  modes  from  the  multiresolution  analysis  produces  a 
hierarchy  of  frequencies,  it  induces  a  hierarchy  of  coarsenings  by  dropping  succes¬ 
sively  higher-frequency  modes,  i.e.,  by  applying  the  low-pass  filter  P£T2n . 

3.1.2  A  Priori  Approximations  for  Reconstruction 

Implementation  of  reconstruction  algorithms  as  discussed  above  requires  a  starting 
point.  In  the  following  paragraphs,  methods  are  proposed  for  finding  good  start¬ 
ing  points  based  on  A  and  other  molecular  information  that  is  available  prior  to 
simulation. 

To  second  order,  a  quadratic  potential  around  the  equilibrium  positions  of  the  trans¬ 
formed  coordinates  approximates  the  full  potential.  We  assume  dominance  of  har¬ 
monic  terms,  both  in  the  orginal  and  transformed  basis: 

Vhond(M~^Uf)  ™\{r-  r0)TUTAU(r  -  f0),  (6) 

where  f  is  the  vector  of  ||r?;||.  From  statistical  mechanics  the  root  mean  squared 
deviation  from  equilibrium  of  a  harmonic  oscillator  is 


where  A  is  the  force  constant.  In  other  words,  the  higher-frequency  components  are 
increasingly  found  close  to  their  energy  minima.  This  implies  that  finer  scales  only 
have  small  deviations  from  their  equilibrium  positions,  while  coarser  scales  may 
access  a  much  larger  space. 

We  start  by  approximating  1 1  CC  •i  3/j  |  j  by  a  Taylor  expansion  around  rff.  This  trans¬ 


it) 


forms  the  bond  potential  ViX)nd  into: 


Cbond  ~ 


Hence,  to  find  equilibrium  distances  for  r,  we  solve  the  minimization  problem 

mm  ]T  Cl3  (\\(M~*Ur0)i-  (M-*Ur0)j\\2  -  ,  (7) 


where  Qj  = 

rij 

Example.  The  2  nonzero  eigenvalues  of  H20  correspond  to  a  unique  solution  for 
reconstructing  H20. 


The  harmonic  Laplacian  for  H20, 


Kh2o  — 


(  2  A  oh /trio 
1 

—Koh/itIq 

1 

\—KOH/mQ 


- KOH/m £ 
Koh 
0 


- KOH/m ^ 
0 

Koh  j 


shares  the  same  structure  as  CH2.  The  eigenvalues  A0.i,2  of  this  simple  matrix  are  0, 
Koh ,  and  (1  +  2/m0)K0H,  respectively.  Since  there  are  only  2  independent  DoFs 
aside  from  the  center  of  mass,  r®  is  a  2-dimensional  system.  In  1  dimension,  Eq.  7 
has  2  solutions  for  this  simple  system;  r ^  =  \/2roH,  r20)  =  0,  which  leads  to  a 
symmetric  linear  molecule,  and  f  ,0:  =  0,  r20)  =  r0H  v/^2m°1 .  In  2  dimensions,  an 

2m  q  2  +)7ig 

angle  between  these  2  solutions  can  be  assigned.  At  90°,  the  equilibrium  structure 
of  water  is  recovered. 

Example.  HCN 


The  harmonic  Laplacian  for  HCN, 


Khcn 


{  Kch+Kqn 
me 

Kch 

1 

mc 
Kqn 
1  1 

\ 


KCh 
- r~ 

mc 

Kch 

0 


Kcn^  \ 


Kqn 

mN 


is  no  longer  as  simple  as  for  H20,  nor  are  the  eigenvalues  except  0  simple;  for 
the  generalized  Amber  force  field  (GAFF),  they  are  14.7  and  42.8.  The  numerical 
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GAFF  mode  distances  in  1  dimension  are  rf}  =  3.26,  =  0.66  and  = 

2.61,  =  —1.32.  Knowing  that  the  equilibrium  angle  between  the  2  modes  is  ca. 

80°  selects  the  first  solution  to  reconstruct  the  equilibrium  structure. 

In  both  examples  it  was  necessary  to  include  angle  information  to  make  the  best 
choice.  The  numerical  solution  to  Eq.  7  can  be  computed  efficiently  using  a  variety 
of  nonlinear  least-squares  algorithms,  but  more  direct  methods  are  still  under  inves¬ 
tigation.  Similar  derivations  are  possible  for  angle  potentials  and  are  under  current 
investigation. 

3.2  Simulation  Results  and  Analysis 

We  witnessed  this  behavior  in  computations  of  PE  as  well  as  polyglucose.  Fur¬ 
thermore,  when  applied  to  the  alanine  dipeptide  model,  commonly  used  for  testing 
coarse-graining  methods  for  proteins,  the  wavelet  decomposition  correctly  identi¬ 
fied  the  various  chemical  groups  at  the  finer  scales,  while  the  coarse  scales  captured 
the  partitioning  corresponding  to  single  bond  rotations.  See  Fig.  1  for  an  illustration 
on  tetra-l,4-D-glucose  and  Fig.  2  for  a  demonstration  of  informational  content  in 
the  wavelet  DoFs  for  PE. 

Fig.  3  shows  the  superimposed  Fourier-transformed  time  series  of  1,000  atoms  from 
the  500-K  trajectory,  which,  due  to  the  high  temperature,  shows  the  highest  mobility 
of  atoms.  Although  the  zero  frequency  is  by  far  the  most  intense  signal  (and  is 
omitted  from  the  figure  for  clarity),  a  wide  range  of  other  frequencies  is  active,  most 
importantly  around  0.45  PHz,  which  limits  the  time  step  of  atomistic  simulations  of 
PE  to  less  than  1.2  fs.  On  the  other  hand,  Fig.  4  shows  the  effects  of  scales  on  time 
series  analysis. 


Fig.  1  Heat  map  of  selected  wavelet  degrees  of  freedom  (DoFs)  in  1,4-D-glucose  tetramer.  Blue 
denotes  positive  coefficients,  red  negative.  From  left  to  right:  fine-grained  wavelet  DoF  cover¬ 
ing  a  single  repeat  unit;  coarse-grained  wavelet  DoF  covering  1  half  of  the  oligomer;  coarser- 
wavelet  DoF  covering  the  oligomer  isolating  repeat  units  with  sign  changes;  coarsest  wavelet. 
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Fig.  2  Selected  reduced  wavelet  representations  of  a  polyethylene  crystal.  From  a)  to  d) 
wavelet  information  is  successively  added,  a)  Coarse  representation  highlights  the  anisotropy 
in  1-dimensional  chain  averaged  over  all  chain  segments;  b)  coarse  representation  of  the  2- 
dimensional  chain  plane;  c)  3-dimensional  representation;  and  d)  full  resolution. 


Fig.  3  Fourier  transform  of  the  y-component  of  1,000  atoms  in  crystalline  polyethylene  (100,800 
atoms)  excluding  zero  frequency  to  allow  detail  at  other  frequencies.  Molecular  dynamics  at 
500  K  and  1  atm.  Left:  Individual  power  spectra  per  atom.  Right:  Power  spectrum  of  magni¬ 
tude  of  optimal  representation. 


Fig.  4  Fourier  transform  of  the  y-component  of  a  100,800-atom  crystalline  polyethylene  system 
sampled  at  1  fs.  3  scales  are  shown:  1st  (left,  finest  scale),  5th  (middle),  and  12th  scale  (out  of 
25,  right). 


13 


The  finest  scale  out  of  25  still  retains  the  high-frequency  components  (top  left),  as 
may  be  expected,  but  they  are  much  less  intense  than  the  remaining  modes.  Travers¬ 
ing  the  scales,  we  note  that  a  decreasing  number  of  DoFs  at  the  coarser  scales  shows 
significant  peaks  at  all.  At  the  medium  scale  (top  right),  no  high-frequency  compo¬ 
nents  are  found  anymore.  Therefore,  medium-scale  and  coarser  DoFs  are  quasi¬ 
static  compared  to  the  finer  scales.  Furthermore,  the  signals  are  clearly  separated 
and  very  sharp  despite  the  high  temperature,  which  speaks  for  strongly  decoupled 
modes  and  justifies  dropping  the  finer  scales,  which  in  turn  facilitates  speed-up  by 
not  only  reducing  DoFs  but  also  increasing  the  propagation  time  step. 

The  same  is  witnessed  for  alanine  dipeptide  and  polyglucose  (not  shown).  Fig.  5 
shows  the  power  spectra  for  all  22  DoFs.  Clearly,  a  large  number  of  frequencies 
are  active  for  all  atoms.  The  spectrum  of  optimal  representation  shows  some  clear 
peaks  around  0.13  and  0.10  PHz.  But  in  Fig.  6,  the  0.13-PHz  peak  is  absent  after  the 
2nd  scale,  and  the  0.10-PHz  peak,  including  noise  down  to  ca.  0.06  PHz,  vanishes 
from  the  4th  scale  on  through  the  remaining  4  scales. 


0.08  0.12 

co/PHz 


0.20 


co/PHz 


Fig.  5  Fourier  transform  of  the  z-component  of  alanine  dipeptide  in  vacuum  excluding  zero 
frequency  to  allow  detail  at  other  frequencies.  Molecular  dynamics  at  500  K  and  1  atm.  Left: 
Individual  power  spectra  per  atom.  Right:  Power  spectrum  of  magnitude  of  optimal  represen¬ 
tation. 
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co/PHz  co/PHz 


Fig.  6  Fourier  transforms  of  alanine  dipeptide  degrees  of  freedom  (DoFs)  time  series  in  vacuum 
excluding  zero  frequency  to  allow  detail  at  other  frequencies.  Molecular  dynamics  at  500  K 
and  1  atm.  Upper-left:  z-component  of  optimal  representation;  Upper-right:  2nd  finest  optimal 
wavelet  DoF  z-component;  Lower-left:  3rd  finest  optimal  wavelet  DoF  z-component;  Lower- 
right:  4th  finest  optimal  wavelet  DoF  z-component. 


We  investigated  the  correlation  matrices  of  the  various  coordinate  systems  to  quan¬ 
tify  the  extent  of  interdependence.  Fig.  7  shows  the  correlation  matrices, 

f  x*(t)xj(s  —  t)dsdt 

(f  x*(t)xi(s  —  t)dsdt  f  x*(t)xj(s  —  t)dsdt ) 2 

where  Xi(t)  is  a  time-dependent  coordinate,  as  heat  maps.  As  may  be  expected,  all 
particle  coordinates  are  strongly  correlated,  but  in  both  the  water  case  as  well  as 
the  alanine  dipeptide  case,  much  less  correlation  is  witnessed  for  the  wavelet  DoFs 
(Fig.  7,  right),  despite  the  few  DoFs  involved  in  these  small  systems.  The  extent 
of  overall  correlation  can  be  assessed  by  the  ('i-norm  of  the  correlation  matrices, 
ca.  28  versus  ca.  21  for  water,  and  ca.  315  versus  ca.  168  for  alanine  dipeptide. 
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Fig.  7  Top:  Heat  maps  of  the  degree  of  freedom  (DoF)  correlations  of  a  water  dimer  in  vacuum; 
Bottom:  Heat  maps  of  the  DoF  correlations  of  alanine  dipeptide  in  vacuum;  Left:  Correlations 
of  the  cartesian  DoFs;  Right:  Correlations  of  the  wavelet  DoFs. 


A  further  indication  of  the  appropriateness  of  the  diffusion  wavelet  DoFs  is  the 
fx-norm  of  structures  and  forces  in  a  given  representation  basis.  The  f i  -norm  is 
bounded  by  the  £2-norm  for  any  orthonormal  basis,  and  there  exists  at  least  one 
coordinate  system  in  which  ||x||i  =  [ [a:[ [2.  These  coordinate  systems  optimally 
describe  a  state  x. 

We  compared  the  f  j-norm  in  the  XYZ  coordinate  system,  the  non-lossy  wavelet 
coordinate  system,  and  the  optimal  coordinate  system  on  each  scale.  For  alanine 
dipeptide,  a  reduction  by  a  factor  of  ca.  2  was  achieved  by  switching  to  wavelets 
(3  times  the  optimum),  while  the  optimal  wavelet  system  managed  to  reduce  to 
twice  the  optimum.  For  polyglucose,  the  full  wavelet  DoFs  reduce  the  fi-norm  to 
half  of  the  XYZ  coordinate  system,  and  the  optimal  wavelet  system  reduces  it  to 
within  20%  of  the  optimum.  For  crystalline  PE,  the  full  wavelet  DoFs  improve  the 
structural  /:rnorm  by  a  factor  of  ca.  7,  while  the  optimal  wavelet  representation 
comes  in  below  1%  of  the  XYZ  coordinate  system  to  a  factor  of  roughly  twice 
the  optimal  representation.  Similar  results  were  obtained  for  amorphous  PE  and 
nanocellulose. 
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4.  Concluding  Remarks 


We  have  characterized  and  applied  a  diffusion-wavelet-based  multiresolution  of 
DoFs  in  MD.  The  methodology  was  applied  to  unimolecular  systems  of  alanine 
dipeptide,  crystalline  PE,  and  an  oligomer  of  1,4-D-glucose.  In  all  cases,  a  clear 
and  strong  separation  of  time  scales  was  associated  with  individual  particle  scales. 
This  separation  was  evident  even  for  the  very  small  alanine  dipeptide  system.  Be¬ 
cause  the  particle  scales  are  derived  systematically  from  the  underlying  bonding 
topology  and  do  not  require  any  further  input,  the  wavelet  decomposition  system¬ 
atically  identifies  DoFs  for  conventional  coarse-graining  procedures.  The  proposed 
approach  goes  beyond  conventional  approaches  based  on  expert  knowledge;  how¬ 
ever,  the  wavelet  method  can  be  used  even  in  the  absence  of  expert  insight,  and 
furthermore,  can  be  used  on  all  length  scales,  including  the  smallest.  Future  work 
will  focus  on  exploiting  these  findings  to  speed  up  MD  a  priori. 
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List  of  Symbols,  Abbreviations,  and  Acronyms 


CG 

coarse-grained 

DoF 

degree  of  freedom 

GAFF 

generalized  Amber  force  field 

MD 

molecular  dynamics 

PE 

polyethylene 
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